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ligh-throughput primer 
nd probe design 

laowei Wang and Brian Seed 



e design of appropriate primers for the specific quantitative generation of 
J^A amplicons from diverse transcripts is an important requirement for 
-G5t applications of real-time PGR. For the simultaneous measurement of 
ultiple transcripts in a single experiment, an additional constraint, that 
reactions proceed efficiently under the same conditions, must be 
^posed. In the first section of this chapter, we describe general primer and 
y^be design guidelines for real-time PGR. In the second part, we present an 
llihe real-time PGR primer database encompassing most human and 
^|)use genes idenrified to date. The primer database contains primers that 
^ rform well under a single set of conditions, allowing many simultaneous 
exterminations of mRNA abundance to be carried out. 

Primer and probe design guidelines 



jd.l Primer specificity 

p)n-specific amplification is one of the greatest challenges for the success- 
fill deployment of real-time PGR methods intended to be used for the 
lllidation or discovery of transcript abundance variarion. In addition to the 
s|quence of interest, many thousands of other sequences can be expected to 
Je;present in such applications. The design of PGR primers for this purpose 
;^fiould therefore take into account the potential contribution of all possible 
qff-target template sequences, in order to prevent mispriming. This is 
^ usually achieved by comparing sequence similarity between the primers 
and all other template sequences in the design space. 

5^1.2 Primer length 

:PGR primers are typically 16-28 nucleotides long. If the length is too short, 
,it is difficult to design gene-specific primers and choose optimal annealing 

temperature. On the other hand, very long oligos unnecessarily increase 

oligo synthesis cost. In addition, longer primers are more likely to form 
; secondary stmctures that result in decreased PGR efficiency or promote 

primer dimer formation, since the primers constitute the nucleic acid 

sequences at highest concentration in the reaction. 

5.1.3 Primer GC content 



In most PGR applications the primer GG content lies between 35% and 
65%. If the GC content is too high, mispriming frequently results, because 
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even a short stretch of oligo sequence may form a stably annealed duplex 
with non-target templates. On the other hand, very low GC content may- 
result in poor primer binding, leading to decreased PGR efficiency. 

5.1.4 Primer 3' end stability ') 

The 3' end residues contribute strongly to non-specific primer extension by 
Taq DNA polymerase, especially if the binding of these residues is relatively , 
tight to the non-target template. Therefore, primers with very high 3' termi- 
nal stability should be rejected. The binding stability can be calculated from . 
the free energy profile (AG). Typically the more computationally demands 
ing aspects of calculating free energy, such as loop entropy, can be ignored* 
because of the unfavorable energetics of opening small loops (internal' 
denaturation) compared to end melting. 

5.1.5 Primer sequence complexity 

Low-complexit}^ sequences should be identified and discarded during;! 
primer design, due to the likelihood of mispriming with such primers. In 
addition, some types of low sequence complexity pose a challenge fori 
oligonucleotide synthesis. For example, a large number of contiguous - 
guanosine residues in a primer can lead to poor synthesis yield due to;: 
decreased chemical coupling efficiency. In addition the resulting primers ^ 
can exhibit poor solubility in aqueous media. ^ 

5.1.6 Primer melting temperature 

The melting temperature (Tm) is the most important factor in determining - 
the optimal PGR annealing temperature. An ideal PGR reaction should have : 
forward and reverse primers with similar Tm values, Tm is not only deter- ^' 
mined by primer sequence, but also by other parameters, such as salt concen- : 
tration and primer concentrarion. In recent years, extensive thermodynamic • 
studies have been carried out to accurately determine oligo Tm values. 

Gurrently, the following methods for Tm calculation are adopted by most ' 
primer design programs. ■ 

The '4 + 2; rule 

/ Tm = 4* (G + C) + 2* 04 + 7). I 

This is a simple equation solely based on primer GC content. Tm is calcu- 
lated by counting the total number of G/C and A/T. Each G/C contributes : 
4°C and each A/T contributes Z^'C to Tm. This method is sometimes used to ^ 
quickly approximate Tm values for very short oligos (<10 residues). 
However PGR primers are usually much longer and this Tm calculation 
method is not generally recommended. 

Simple equation based on GC content and salt concentration 
Tm = 81.5 + 16.6 * Logio[Na*«,] + 0.41 * (%GG) - 600/L 



"where [Na*^] is the eq 
' percent of G/C in the' 
: ..:t?89)- This method is 
&cause it is relatively 
calculated in this way 
: degrees different from 
method is considered t 
\iong DNA duplex, such 

qrhe nearest neighbc 



,vyhere R is the gas con 
Vtion, is the enthal] 
Vetal, 1986). This is cor 
' pligo thermodynamic : 
primer Tm determinati 
determined thermodyr 
: (SantaLucia, jr., 1998). 

The entropy change 
; an entropy correction 
ln[Na*eJ, where L is the 
molar concentration fu 
study (von Ahsen et al, 
equation: 



where monovalent catu 
buffer. 
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J:vvhere fNa%J is the equivalent sodium molar concentration, (%GC) is the 
^' 5ercent of G/C in the primer, and L is the primer length (Sambrook et al, 
i|gS9). This method is widely used in many computer programs, partially 
^ because it is relatively easy to implement. Despite its simplicity, Tm values 
Ifalculated in this way usually have reasonable accuracy and are only a few 
Idegrees different from empirically determined Tm values. In addition, this 
^pfethod is considered to be the most accurate way to calculate Tm for very 
siting DNA duplex, such as PGR ampiicon. 



^he nearest neighbor method 

Tm = Afr/(AS" - R In (0^/4)) 

liiiere R is the gas constant (1.987 cal/Kmol), Cj is the primer concentra- 
IS^n; AfT is the enthalpy change, and AS° is the entropy change (Breslauer 
1986). This is considered to be the most accurate method to calculate 
p^Iigo thermodynamic stability, and thus is the recommended method for 
^primer Tm determination. AH° and AS"" are calculated using the empirically 
pdetermined thermodynamic parameters for neighboring bases in a primer 
IpantaLucia, Jr., 1998). 

-"■The entropy change is significantly affected by salt concentration. Thus, 
§aii entropy correction is required: dS" = AS'' (1 M Na*) + 0368 {L - 1) 
ljri[Na^eq]/ where L is the primer length and [Na%q] is the equivalent sodium 
ppriolar concentration from all salts in a PGR reaction. According to a recent 
|'5tudy (von Ahsen et al, 2001), [Na%q] can be determined by the following 
^:^uation: 



[Na%] = [Monovalent cations] + 120 v/[Mg'') - [dNTPs], 

fcvyhere monovalent cations are typically present as Na^ and Tris* in PGR 
i jjuffer. 



m 
si 



^5,1.7 Primer location in the sequence 

^^)ligo dT and random primers are commonly used in reverse transcription 

Ij;. (ftT) to produce cDNA template for real-time PGR. Oligo dT primers anneal 
""-specifically to mRNA poly(A) tails, thus minimizing non-coding cDNA 
products. However, tliis priming strategy introduces 3' bias in cDNA synthe- 
||;^s;is because it is difficult to produce full-length cDNAs due to limited RT 
extension capability. If oligo dT primers are used in RT, the real-time PGR 
primers should be picked from the 3' region of a gene sequence to gain 
|lj^maximum assay sensitivity. 

" In contrast, random primers are often used in RT for full transcript cover- 
u-^S^- Random primers can potentially bind to any site in a transcript 
sequence. Because all primers are extended toward the 5' end of tlie 
-^ranscript, there will be a linear gradient of sequence representation, with 
highest cDNA abundance in the 5' regions. If random priming strategy is 
I adopted, real-time PGR primers should be picked close to the 5' end of the 
, target sequence for maximum sensitivity in real-time PGR. 
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5.1.8 Ampfkon size 

PGR efficiency can be affected by amplicon size. Ver>' long ampiicons lea<fe^- 
to decreased PGR efficiency. Since PGR efficiency is one of the most impoii^ 
tant factors for accurate expression quantification, the amplicon should be ^ 
smaller than 250 bp. Typically the size range is 100-250 bp. 

5.1.9 Cross-exon boundary 

To minimize the effect of DNA contamination in RNA template, the 
forward and reverse primers can be designed from different exons and to j 
span exon-intron boundaries. In this way one can reduce the genomic DNA ? 
contribution to expression quantitation. This design strategy is most impob :! 
tant for accurate quantitation of low-expressing genes or genes with manyl 
loci in the genome. However, in general, DNA contamination has minimal^ 
effect on real-time PGR because a typical transcript copy number is much^ 
higher than the number of gene loci and standard RNA preparation proce- :l 
dures remove genomic DNA efficiently. In any case, this strategy may fail ifl 
pseudogenes of the target gene are present in the genome, 

5.1.10 Primer and template sequence secondary structures 

Primer or target template secondary structures can retard primer annealing^jj 
leading to decreased amplification efficiency. The likelihood of secondary^ 
structure is greatest in regions rich in complementary base pairing, such asl 
the stem of a stem-loop structure (Mount 2001). If part of a primer or target! 
sequence is inaccessible due to secondary-structure formation, the primerl 
annealing efficiency may decrease dramatically. : 
In addition, primer secondary-structure may lead to primer dimer forma- i: 
tion, which is one of the biggest challenges for accurate quantification byf 
real-time PGR, especially when DNA intercalating dyes (e.g., SYBR® Green I) I 
are used. Primer dimers can be produced by primer self-annealing, or by 
annealing between the forward and reverse primers. Therefore, the forward I 
and reverse primers should be evaluated together during design to avoid " 
potential primer dimer formation. 

5.1.11 TaqMan® probe design 

Many of the guidelines for primer design are also applicable to TaqMan® H 
probe design. It is reconunended to use Primer Express® software (Applied 
^, Biosystems) for TaqMan® assay design. 

• The probe melting temperature in general should be --lO^C higher than 
the forward or reverse primer. 

• Do not put G at the 5' end of the probe as this will quench reporter 
fluorescence. 

• In general the GC content should be 35-65%. 

• The probe should not self-anneal to form secondary structure and 
should not be picked from a gene region with high likelihood of 
secondary structure. Secondary structure formation may reduce 
hybridization efficiency, leading to reduced assay sensitivity. 
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k The probe should be selected from gene-specific regions with reason- 
-'^ :V able sequence compiexity to avoid cross-hybridization. 
K«i^The probe should be as close to the forward and reverse primers as 
possible, without overlapping the primer sequences. The amplicon size 
^^^"S is usually in the range of 60-150. 

|i^.12 Molecular beacon probe design 

l^ljlecular beacon probes form stem-loop hairpin structures at low temper- 
iillre. In this closed state, the fluorophore and the quencher are held in 
Rifee vicinity, and thus no fluorescence signals are detected (Tyagi and 
^ famer, 1996). The hairpin loop contains gene specific sequence for 
^^ridization in real-time PGR. Thus, the rules for TaqMan® probe design 
^|o apply to molecular beacon loop design. The hairpin structure is 
^^rupted by hybridization to the amplicon, which separates the 
^Juorophore from the quencher for fluorescence detection. Because of this 
^^Mque structural requirement, one major task for molecular beacon probe 
pilsign is to identify a suitable hairpin structure that melts 7-10''C higher 
"]§han the PGR primers. The Tm of the hairpin stem cannot be calculated 
^|ing the Tm formulas for PGR primers because the stem is formed by 
ftifitramolecular folding. In general, programs for secondary structure predic- 
tion, such as Mfold (Zuker, 2003), can be used to predict hairpin stem 
^ imelting temperature. The stem usually consists of 5-7 base pairs with 
|S|5-100% GC content. A G residue should not be placed at the end of the 
If^tem because it will quench the fluorophore. 

^.2 PrimerBank - an online real-time PGR primer database 

^5.2.1 Primer design algorithm 

We have developed a real-time PGR primer design algorithm based on the 
"^"^^j: general guidelines described in Section 1 (Wang and Seed, 2003a). An 
^outhne of the algorithm is presented in Figure 5.1. The algorithm is 
^ implemented by a design tool called uPrimer, with which we have designed 
more than 300,000 primers encompassing most human and mouse genes, 
"'fg: The following comprise detailed descriptions of the properties of these 
primers. 

Prirp^r specificity 

Although primer specificity is one of the most important requirements in 
real-time PGR, most primer design programs take only one target sequence 
without considering mispriming to off-target templates. At the present 
time, to address the mispriming problem, one can design a number of 
primer pairs and then individually check aoss-matches of each primer with 
BLAST (Altschul etai, 1990). However this screening step is incomplete and 
does not consider some important design criteria. For example, cross- 
matches at the 3' end of a primer are more likely to produce non-specific 
amplicons. In our experience, only about two thirds of the primers designed 
in the conventional way can be used in real-time PGR experiments. 



